Short chain dehydrogenases/reductases(sdr)

ABSTRACT

The present invention relates to a method for identifying or verifying members of the short chain dehydrogenase (SDR) family, to a method for providing modulators for members of the SDR family and to the preparation of pharmaceutical agents using these modulators.

The present invention relates to a method for identifying or verifyingmembers of the short chain dehydrogenase (SDR) family, to identifiedSDRs, to a method for providing modulators for members of the SDR familyand to the preparation of pharmaceutical agents using these modulators.

TECHNICAL BACKGROUND

The short chain dehydrogenase/reductase (SDR) protein family (H.Jörnvall et al., Biochemistry 34 (1995), 6003-6013) is an old conservedprotein family, the members of which show a residue identity level ofonly 20-30%. However, it has been found that the three-dimensionalstructure of members of the SDR family are highly similar, determiningtheir functions and affiliation to the SDR family (U. Oppermann et al.,Enzymology and Molecular Biology of Carbonyl Metabolism 6, Weiner et al.eds., Plenum Press, New York (1996), p. 403-415).

While initially only two structures of SDR enzymes restricted tobacterial and insect enzymes have been discovered, rapid progress on theknowledge of short chain dehydrogenases/reductases resulted in anincreasing number of structures, which could be assigned to the SDRfamily. Currently, about 1.600 putative members are known, from which upto 100 may be derived from human, such as hydroxysteroid dehydrogenases(HSD).

An approach to identify SDR proteins is described in W. N. Grundy etal., Biochemical and Biophysical Research Communications 231 (1997)760-766 and in T. L. Bailey et al., J. Steroid Biochem. Molec. Biol. 62(1) (1997) 29-44. Therein homologies are searched for via a hiddenMarkov model, i.e. a self-training model, and thus classified to acertain protein family. A classification based on the function is notmade in these models.

Since the SDR enzymes are involved in various metabolitic pathways andshow different activities, such as oxidoreductases, lyases, orepimerases and, as discussed above, show only a low identity of 20-30%,it has been difficult, to assign new members unambiguously to the SDRfamily and to find modulators therefor.

However, since HSD and other SDR play a critical role in highervertebrates, it is desirable to discover further members of the SDRfamily and establish modulators for known and new SDR enzymes.

It was therefore an object of the present invention to provide analgorithm which allows for the identification or verification of SDRfamily members with high confidence levels.

It was a further object of the invention to provide an algorithm whichprovides a search hierarchy with various levels.

It was another object of the present invention to provide modulators forSDR family members.

Still another object of the invention was to provide pharmaceuticalagents based on members of the SDR family.

SUMMARY OF THE INVENTION

The present invention relates to a method for identifying or verifyingmembers of the short chain dehydrogenase (SDR) family based on analgorithm using core SDR motifs for searching members of the SDR family.Further, the present invention relates to a method for providingmodulators for such members of the short chain dehydrogenase (SDR)family, which enhance or inhibit the activity therefrom as well as amethod for providing a pharmaceutical agent using modulators for membersof the SDR family.

In particular the present invention provides a combination of the steps(i) screening databases to search and find SDR sequences, (ii) store thedata on an appropriate medium, rank and validate the hits and (iii)using the SDR sequences found to develop new drugs.

DETAILED DESCRIPTION OF THE INVENTION

Members of the SDR protein family have a common core sequence, which isabout 250-350, preferably about 260-290 and in particular about 270amino acids in length. SDR proteins can have extensions at theN-terminus and/or at the C-terminus. Typically, these extensions have alength of 20 to several hundred, in particular up to 500 amino acids.These extensions can be membrane anchors or other signals or they canconstitute completely distinct protein domains. Therefore, according tothe invention it is primarily searched for SDR core domains, the rest ofthe protein being analysed only later on.

In a first embodiment the invention provides a method for identifying orverifying members of the short chain dehydrogenase (SDR) familycomprising the steps

-   -   (a) providing a target sequence of molecules to be classified,    -   (b) comparing said target sequence with core SDR motifs selected        from        -   (i) MV1 being derived from the motif MT1:TGxxxGxG by            replacement of 0 to 2 amino acids,        -   (ii) MT2:NN(0-2:x)AG,        -   (iii) MT3:N, located at a position 90-110 relative to MT1,        -   (iv) MV4 being derived from the motif MT4:S(11-52:x)YxxxK by            replacement of 0-2 amino acids and        -   (v) MT5:PG,    -   (c) determining positive SDR candidates containing        -   (i) at least the core SDR motifs MV1 and MV4 and        -   (ii) at least 7 of the 14 amino acids contained in the            motifs MT1, MT2, MT3, MT4 and MT5 and    -   (d) classifiying positive SDR candidates as belonging to the SDR        family.

It has been found in many SDR proteins that several motifs of the SDRcore domain often occur in combination. However, it is not obligatorythat all SDR core motifs are present for a protein to be an SDR enzyme.Since SDR proteins may lack one or several of the core SDR motifs, theymay not be found by simple comparison of the complete SDR core domains.

Within the SDR core the following functional motifs frequently arefound. The motifs are given in order from N-terminus to C-terminusassigning a position number 0 to the start of the first motif MT1, whichof course need not be the start of the complete SDR protein.

-   MT1:TGxxxGxG (circa position 0-7);-   MT2:NNAG (circa position 75-78);-   MT3:N (circa position 100);-   MT4:S-Y-K (circa positions 128/142/146) and-   MT5:PG (circa position 170/171).

Using these motifs, the algorithm according to the invention has beendeveloped, which allows for an assignment of target sequences to be anSDR sequence with a confidence level of more than 95%, in particular orethan 98%. By relying on motifs of the core SDR region positive hits dueto indentity in non significant regions can be excluded. It is essentialfor the present invention that the core SDR motifs were selected becauseof their functional meaning and not only because of homologycomparisions. The SDR motifs used form essential parts of nucleotideco-factor binding region (Rossman-fold) and the active site of membersof the SDR family. The motifs MT1 and MT2 represent components of theco-factor binding site. A particular co-factor of SDR enzymes isNAD(P)(H). The motif MT3 represents a contact to the active site and themotif MT4 a part of the active site. The motif MT5 is of functionalimportance due to its proximity to the co-factor. Thus core SDR motifsare motifs which are essential for the functionality of the SDRs.

For detecting members of the short chain dehydrogenase (SDR) family inthe method according to the invention it is therefore essential thatfunctional aspects are considered, wherein enzymatically active SDRs aredetected and not only sequences which exhibit a certain homology toother SDRs at functionally irrelevant positions.

Contrary to prior art algorithms, according to the invention those aminoacids are taken into account which are essential for the function. Aminimum amount of the amino acids selected thus enables a maximum amountof targets due to the divergence of the SDR family, wherein thedetection of erroneously positive targets is basically excluded becauseof the connection between function and structure. This way the targetspecificity can be considerably improved over algorithms, such asneuronal networks, which are based on homology comparisons (cf. J. AGerlt et al., Genome Biology, 1 (5) (2000), Reviews 0005.1-0005.10). Inaddition, further functional information can be easily included in orderto screen for functional deficits, such as screening for an associateddisease mutations or individualized drug metabolism.

While the individual proteins assigned to the SDR family using thealgorithm of the invention may have identities of only 30% or less, theyshow a very similar three-dimensional structure. It is important for thecorrect formation of the desired three-dimensional SDR structure thatmotifs 1 to 5 are present in the above listed succession.

For the description of the motifs the single letter amino acid code isused. x denotes a variable amino acid, selected preferably from the 20naturally occuring amino acids. NN(0-2:x)AG means that 0, 1 or 2 aminoacids can be positioned between amino acids N and A. S(11-52:x)YxxxKmeans that from 11 to 52 amino acids are positioned between S and Y and3 amino acids are positioned between Y and K.

A replacement of 0-2 amino acids refers to a replacement of any of theamino acids given (including x), whereby preferably the explicitly namedamino acids are replaced. A replacement includes deletion of the aminoacid or a substitution of the amino acid by another amino acid selectedpreferably from the 20 naturally occuring amino acids. The replacementof 1 or 2 amino acids results in a fuzzy logic including also sequences,in which the motifs are not 100% conserved. A strategy combiningsequence and structure information is also disclosed by L. Yu et al.,Protein Science 7 (1998), 2499-2510.

In a preferred embodiment of the invention MT2 is defined to be NNAG(i.e. without any amino acids x between NN and AG), but with possiblereplacement of 1-3 amino acids.

The motif MT3:N is located at position 90-110, preferably at position95-105 and in particular at position 100 relative to the start of themotif MT1.

In a particularly preferred embodiment of the invention the second partof motif MT4 is defined to be the pattern YxASK with possiblereplacement of up to 3 of these residues. In this preferred embodimentthe range of possible scores is extended from 0-14 up to 0-16. In thisembodiment positive candidates have a score of at least 7, preferably atleast 9, more preferably at least 11 and most preferably at least 13.

Preferably the SDR motifs are located in the order given from theN-terminus to the C-terminus for a sequence to be classified as SDRsequence. The positions given in brackets above may be shifted by aminoacid insertions or deletions within the sequence analyzed. Preferablythe motifs are found within ±50, more preferably ±20 positions, inparticular ±10 positions and most preferably ±5 positions, from thevalues given.

A target sequence is classified as belonging to the SDR family accordingto the invention, if it contains at least the core SDR motifs MV1 andMV4 and at least 7 of the 14 explicitly named amino acids contained inthe motifs MT1, MT2, MT3, MT4 and MT5. The confidence level can becontrolled by varying the amount of matching amino acids, which have tobe present in the target sequence. Therefore, if a high confidencelevel, e.g. >98%, more preferably >99% is desired, it may be preferableto classify target sequences as positive SDR candidates, only if theycontain at least 9 of the 14 amino acids, or even at least 11 or atleast 12 of the 14 amino acids contained in the motifs MT1-MT5. Settingthe score at a value of at least 13 results in the detection ofexclusively sequences, which are an SDR with a confidence level ofalmost 100%, e.g. >99.8%.

In a preferred implementation of the method of the invention a file isprovided containing a set of protein amino acid sequences, the inputset. One sequence is taken from the input set, the query sequence. Theimplementation then passes the query sequence to the algorithm, whichexamines it for occurences of some or all of MT1-5 in the arrangementsallowed. The algorithm returns a list of the best possible combinationsof occurences. If the matches contain more than a specified number ofamino acids from MT1-5, they are assigned as hits.

In a particularly preferred embodiment the method of the invention is asfollows:

The algorithm first searches the whole sequence for instances of thefirst motif MT1, allowing for up to two replacements as described. Eachpossible MT1 match is then taken as the origin for searches for themotifs MT2 to MT5 whose positions are defined relative to the positionof MT1. A data structure based on each position of MT1 is created, whichwill be used to store the positions of other motifs relative to thisMT1.

For a given MT1 match at position P, the preferred position of the motifMT2 is P+75. According to the rules MT2 is preferably at position(P+75)+/−50, more preferably (P+75)+/−20. This defines a window on thesequence within which instances of the motif MT2 are searched for,including any variants of MT2 with up to three replacements. Since thesize of the window affects the time taken to search and the quality ofthe matches found, the preferred implementation allows the window sizesto be specified for each search. Any possible matches within the windoware added to the result data structure as children of the current MT1.

The procedure is then repeated for instances of MT3, where the window is(P+100)+/−50, more preferably (P+100)+/−20, or any other specifiedwindow size. Again, any results found within the allowed window areadded to the result data structure as children of the current MT1.

The same procedure is then followed for MT4, with a window (P+128)+/−50,more preferably (P+128)+/−20, or any other specified window size. Inthis case the window only specifies the position of the Serine residueof MT4, and once a candidate Serine has been found at position P_(S)(and added to the result structure as a child of the current MT1), itdefines a window P_(S)+(11-52), within which instances of the secondpart of MT4 are searched for, allowing for replacements. Any candidatesare added to the result structure as children of the current Serinematch.

Since MT4 allows replacements, and those replacements could includereplacements of the Serine, the implementation additonally searches forthe second part for MT4 in cases where the Serine is not found. In thiscase, a virtual window composed of all of the possible positions of the(missing) Serine, offset by the P_(S)+(11-52), is constructed[P_(S)+(11-52)+/−20, i.e. the range P+128+(11-20) to P+128+(52+20)], orlikewise for any specified window size. If any instances of the secondpart are found they are added to the result data structure as childrenof the current MT1.

The procedure is then repeated for instances of MT5, where the window is(P+170)+/−50, more preferably (P+170)+/−20, or any other specifiedwindow size. Again, any results found within the allowed window areadded to the result data structure as children of the current MT1.

At this stage the implementation holds in memory a tree-structured datastructure where the possible matches with the specified patterncorrespond to depth-first traversals of the tree. The implementationenumerates the possible combinations of the full or partial motifs, addsup a score calculated from the number of residues in the motifs whichwere actually matched, and discards the instances where the overlappingwindows have given rise to motifs where the ordering is notMT1-MT2-MT3-MT4-MT5. Any combination with a score equal to the maximumscore found is kept and added to a list, and it is this list with itsscore, the motifs found, and the position in the sequence of each aminoacid matched which is returned as the result at this stage of theimplementation.

The preferred implementation includes significant enhancements, inparticular:

-   -   MT2 is defined to be NNAG without the presence of 1 or 2 amino        acid insertions between the NN and AG parts. The implemenation        allows for replacement of 1-3 of the residues, and will continue        to search for other motifs even if no instance of MT2 is found.    -   The second part of MT4 is defined to be the pattern Y*ASK        instead of Y**K, but again the implementation allows replacement        of up to 3 of these residues. This makes the range of possible        scores 0-16 instead of 0-14.    -   The absence of motif MT4 is not used to discard SDR candidates,        but the effect on the overall score of its absence (5 out of a        possible 16 matches) is significant in excluding matches which        do not contain it and additionally the presence of the active        site MT4 tyrosine is indicated for each result, as a significant        indicator of possible SDR catalytic activity.

In a further preferred embodiment an enlargement or optimization, of thealgorithm is performed also on human extended SDRs. Thus it is takeninto account that compared to the other SDRs often only a motif MT_(x)1(TGxxGxxG) as well as a motif MT_(x)4 (YxxxK) is present, whereinMT_(x)1 is a variant of TGxxxGxG. For determining human extended SDRswith this enlarged algorithm motifs 2, 3 and 5 can even be missing.

In a particularly preferred embodiment the algorithm according to theinvention comprises the import of a data set, e.g. from data bases,organizing the data set by using the method according to the invention,ranking the SDR hits and further analyzing and managing the data of thedetected hits, such as a cross-linking to data bases, to BLAST or toother tools.

Subject matter of the invention is also a data carrier, particularly adiskette containing the method according to the invention andparticularly the above described algorithm.

Whereas the method according to the invention itself already has a veryhigh specificity and reliability in the selection of SDR candidates, theSDR candidates detected can be subjected to further evaluation criteria.These criteria are e.g. comparing the 3D-structure of the candidatesdetected with the 3D-structure of known SDR proteins or a standardized3D-structure, which is derived from SDR candidates identified by themethod according to the invention. Thus, in a further preferredembodiment of the invention the polypeptides classified as positive SDRcandidates in the method according to the invention are subjected toanother evaluation step in view of their three-dimensional structure inorder to further improve the selectivity and specificity of the method.Thus it is possible to use known three-dimensional structures of SDRfamily members (cf. e.g. H. Jörnvall, Biochemistry 34 (1995), 6003-6013;U. Oppermann et al., Adv. Exp. Meth. Biol. 414 (1997), 403-415 or J.Benach et al., J. Mol. Biol. 282 (1998), 383-399). However, it is alsopossible to determine the three-dimensional structures of the SDRcandidates detected with the method according to the invention and toprepare a common comparative three-dimensional structure therefrom. Thisway it can be examined, e.g. whether the positive SDR candidates exhibitthe co-factor binding site typical of SDRs. A further criteria may bethe presence of amino acid Y at position 152±20, particularly ±10.Further, it is possible to compare the amino acids sequences detectedwith known SDR sequences, e.g. via an alignment.

After the sequences have been classified as SDRs it is also possible tosearch for further domains, e.g. membrane domains in order to thusclassify them to a certain type of tissue.

An important subgroup of SDRs are FabGs, which are derived frompathogens and which can be identified via the method according to theinvention. Since FabGs are often strongly degenerated and thus exhibit arelatively low score (e.g. 9 or more) in the method according to theinvention, it can be advantageous to examine possible FabG-SDRcandidates in a second step in view of the presence of the followingmotif variations: MT_(y)2:VxVNNAG, wherein V can be replacedparticularly by I, as well as MT_(y)5:PGFI, wherein F and/or I can bemissing.

A list of FabG proteins which were identified by the method according tothe invention is shown in Table 4. FabGs are involved in the lipidmetabolism of bacteria and are particularly suitable for the developmentof antibiotics.

A further group of SDRs which can be identified by the method accordingto the invention are bacterial SDRs. Bacterial SDRs detected with thealgorithm according to the invention are shown in Table 3.

Further, it is possible to detect production enzymes as well asthermostable enzymes with the method according to the invention.

In a most preferred embodiment, the so-called SDR Finder, the methodaccording to the invention is based on the implementation of functionaldata both on the three-dimensional structure and on the biologicalfunction (NADP(H)-dependend enzymes). The implementation ishierarchically structured according to the smallest common denominatorhaving a functional meaning. Contrary to known tools not motifs, but SDRcandidates are searched for and thus also for those having a very lowhomology or hardly conserved core motifs, respectively. The search forSDR candidates according to the invention enables a considerably higherspecificity. The SDR candidates detected are of biologically functionalrelevance. At the same time a greater number of hits is found due to theuse of the smallest common denominator.

Further, it is possible to establish a ranking with the algorithmaccording to the invention, to export the data in different formats andto selectively search for species. Thus, the SDR_Finder represents an“all-in-one” analysis solution including various obtainablepossibilites, particularly the woldwide web. The implementation ofhyperlinks to NCBI, EMBL and their tools (e.g. Blast, ClustalW, PfaM,PDB, Medline, OMIN) represents an “in silico” analysis/drug developmentsoftware of modular structure which is particularly developed for SDR.Further modules which can be connected thereto are the examination ofthree-dimensional structures, the determination of active centres andthe substrate docking simulation. The latter can also be implementeddirectly into the SDR_Finder and allow direct access, e.g. to3D-databases and chemical libraries via the worldwide web.

In a preferred embodiment the SDR_Finder is equipped with fuzzy logic.

In addition, experimental data can be used, e.g. to evaluate theexchange of one amino acid in a motif regarding the functionalconsequences. This is of importance both for the individual adjustmentof therapies and the evaluation of pathological problems or for thedevelopment of diagnostics, respectively.

Moreover, it is possible to enlarge the algorithm subgroup-specifically,as is shown herein for the FabG SDRs.

The method according to the invention can be used to verify sequences,which are already classified as (putative) SDR sequences, e.g. byautomatic alignment (BLAST), to belong to the SDR family or not.Further, it can be used to search for and find new members of the SDRfamily or to search for and find new isoforms of SDR proteins.Therefore, the method of the invention provides additional informationwith regard to known sequences as well as to novel sequences. From theknowledge that a target sequence belongs to the SDR family as well asfrom the information obtained from the ranking findings about substratesand functions can be obtained. An important selection criteria therebyis the drugability of the SDR candidates detected.

The method according according to the invention can be used to detecte.g. human SDRs (human extended SDRs), animal SDRs, particularlymammalian SDRs, but also bacterial SDRs, FabG SDRs, fungi SDRs, SDRs ofpathogens, SDRs of parasites, e.g. plant parasites.

The SDR proteins classified with the algorithm according to theinvention thus can serve as platform for novel drug development. HumanSDR proteins can particularly serve as starting point for the treatmentof diseases or malfunctions of the body, whereas bacterial SDRsparticularly provide a starting point for the development of novelantibiotics. Further, respecticve SDRs can serve for the development ofantimicotica, pesticides, herbicides etc. . . .

While the algorithm of the present invention preferably is used tosearch for protein sequences, it is also possible the convert the motifsgiven into nucleic acid sequences and screen nucleic acid databases. Amethod to convert amino acid sequences into nucleic acid sequences whileconsidering the degeneration of the genetic code is e.g. given from H.Jörnvall, FEBS Letters 456 (1999), 85-88. A search on the nucleic acidlevel can preferably be used to preselect sequences, which are thenconfirmed by an alignment in the protein level.

For the search on nucleic acid level these protein sequences arepreferably converted to DNA sequences in particular cDNA sequences andused for the detection of further SDR candidates via a fuzzy logic or ahidden Markov model or via neuronal networks.

The method of the invention therefore also provides a tool forpreselection of SDR candidates on the genomic level.

Preferably a ranking of the positive SDR candidate is performed e.g.according to the number of amino acids matching with motifs MT1-MT5.This way a hierarchy and/or an evolutionary relationship of the obtainedSDR candidates can be obtained.

In a particularly preferred embodiment the target sequences classifiedas positive SDR candidates contain at least the core SDR motifs MT1 andMT4.

By hierarchically classifying the verification of the individual coreSDR motifs several levels to detect SDR proteins can be obtained.

By using the algorithm according to the invention the search for SDRcandidates and consequently the development of pharmaceuticals can bedecisively enhanced. So far for the production of pharmaceuticals invitro tissue cultures were admixed with different substrates. Fromcultures, wherein a certain substrate was converted, the target proteinwas isolated. According to the invention, this step and thus theknowledge of a substrate for the development of inhibitors or for thedevelopment of pharmaceuticals is not necessary. Moreover, starting fromthe sequence found a modulator, in particular an inhibitor or activatorcan be derived. This modulator can e.g. be derived from known modulatorsof other, in particular of related SDR proteins, suitable substrates,related functions and tissue distribution for 17β HSD isoforms aredescribed e.g. by H. Peltoketo et al., J. Molecular Endocrinology 23(1999), 1-11. Further, it is possible to derive a modulator from the3D-structure of the SDR sequence. Such a 3D-structure can be obtainedexperimentally, e.g. by X-ray chrystallography or by computer basedcalculations, e.g. ab initio, force field, or rule based methods.Further, by inhibiting the active site of the SDR protein the functionthereof can be determined.

The searching for SDR family members and ranking is also applicable toevaluate lead-candidates for possible inhibitors or modifiers of aspecific enzyme. Leads may be derived from metabolites of evolutionaryclosely related or very distant enzymes from other species, if the samemetabolite may not be found in the respective target organism. Theevolutionary relationship of SDRs and their distinction from MDRs(medium chain dehydrogenase) is e.g. described by H. Jörnvall et al.,FEBS Letters 445 (1999), 261-264 and AKRs (T. M. Penning, Endocrine Rev.18(3) (1997) 281-305).

SDR enzymes are often involved in intermediary metabolisms, as well asin hormone and mediator metabolisms.

Substrates of known SDR proteins include e.g. steroids, such asestrone/estradiol, cortisone/cortisol andtestosterone/3a-androstenediol. Thus, after classifying a sequence asSDR sequence functional tests for steroid substrates result in higherhit rates.

Further substrates of SDR proteins are UDP-glucose,UDP-N-actetylglucosamine, sepiapterin, dihydropteridine,R-3-OH-butyrate, dienoyl CoA, trans-Enoyl CoA, fatty acids, L-3-OH-acylCoA. These substrates are particularly converted of SDR enzymes, whichare involved in the intermediary metabolism. Further substrates of SDRproteins, particularly of SDR enyzmes, which are involved in hormone,mediator and xenobiotic metabolisms, are several hydroxy steroids, e.g.3-beta-hydroxysteroids, 11-beta-hydroxy steroids or 17-beta-hydroxysteroids as well as prostaglandines and retinoides.

Further, searching SDRs, ranking and comparing evolutionary patterns canalso be used to detect clinically relevant polymorphisms and/or singlenucleotide polymorphisms (SNPs). This approach can be used tocharacterize diseased mechanisms as well as metabolism of xenobiotics,e.g. drug metabolism.

The identification of SDR members, ranking and comparing evolutionarypatterns also allows for the identification of structure-functionrelationships. These structure-function relationsships are a key foridentification of substrates of ORFs with unknown functions.

Within a lead oriented characterization first binding of a positive SDRcandidate is evaluated. Starting from the binding a modulator, e.g. aninhibitor or activator, can be developed. Useful information fordeveloping an inhibitor can be obtained from protein sequence alignmentof full-length sequences, e.g. by comparison with known SDRs. Further,valuable information can be obtained from expressed sequence tags (EST)and gene sequence comparison. The procedure using the algorithmaccording to the invention allows for a great reduction of possiblemodulator candidates to be analysed and practically excludes targetsequences, which are not SDR sequences. Therefore, an analysis of thefunctions in vitro or in vivo can be performed with much less effortthan in the state of the art due to the reduced number of compounds tobe tested. While in the methods according of the state of the art oftenthe substrate must be known, this knowledge is not essential fordeveloping modulators or/and drugs according to the invention. It iseven possible to derive possible substrates in a subsequent step fromthe functions of the SDR enzymes found according to the invention.Ligands can be derived according to the procedure described by G. R.Lenz et al., DDT, 5(4) (2000), 145-156.

The validation of the potential SDRs found according to the algorithm ofthe invention, which can be used as new targets for drug development,can then be performed by experimental biochemical methods, such ashigh-throughput function screening for function identification, ultrahigh-throughput screening for lead compounds, transfection assays, knockout experiments, microarrays, tissue expression, cDNA arrays or analysisof disease in animal or in vitro model systems. However, it is alsopossible to use virtual methods using e.g. computers for validation ofthe new targets, e.g. by molecular homology modelling or substratedocking simulations.

Suitable strategies include e.g. gene expression of an identified SDRprotein to obtain the protein molecule and subsequently performingbiological functional assays and observe the behaviour of the cell.

Alternatively, the 3D-structure may be derived from the SDR sequence andan inhibitor for the active site provided. Using the inhibitor thefunction of the SDR within an organism can be evaluated.

Small weight inhibitors for SDR enzymes, which can be used as startingpoint for developing new or modified inhibitors, in particularinhibitors for newly identified SDR enzymes include:

1) Steroidal-based inhibitors like steroid carboxylates, acrylates,enolates 3,4- and 16,17-fused ring pyrazoles, 3 alpha, 17-beta or20-beta-spiro-oxiranes as well as steroidal spirolactones, progestins,ursodexycholate, synthetic analogs of estrone sulfate andestrone-3-amino derivatives.

2) Inhibitors based on flavonoides and dihydropterin derivatives.

3) Inhibitors based on polyphenols and derivatives of2,3-dihydroxy-1-naphthoic acids likegossypol(1,1′,6,6′,7,7′-hexahydroxy-5-5′-diisopropyl-3,3′-dimethyl-2,2′-binaphthalene-8,8′-dicarbaldehyde).

4) Inhibitors based on glycyrrhizin(3beta,20beta)-29-hydroxy-11,29-dioxoolean-12-en-3-yl 2-O-beta-Dglucopyranuronosyl-alpha-D-glucopyranosiduronic acid) and components ofenzymatically hydrolysed licorice extract like3-O-beta-D-glucoronopyranosyl-24-hydroxy-18beta-glycyrrhetinic acid,3-O-beta-D-glucur-onopyranosyl-18beta-glycyrrhetinic acid and3-O-beta-D-glucuronopyranosyl-18beta-liquiritic acid, monoglycosylatedderivatives of glycyrrhizin as well as carbenoxolone.

5) Pharmaceutically acceptable salts of the above mentioned moleculessuch as alkali metal (e.g. sodium), alkaline earth metal (e.g.magnesium) or ammonium as well as salts of organic carboxylic acids,such as acetic, citric, oxalic, lactic, tartaric, malic, isothionic,lactobionic, ascorbic and succinic acids; organic sulfonic acids, suchas methanesulfonic, ethanesulfonic, benzenesulfonic and p-tolysulfonicacids; and inorganic acids, such as hydrochloric, sulfuric, phosphoric,and sulfamic acids.

Further candidates for inhibitors are chalcones (cf. Life Sci 68 (7)(2001) 751-761) as well as phytoestrogens (cf. Life Sci 66 (14) (2000)1281-1291) and frenolicin and its derivatives.

Further, inhibitors can be derived from 3D-structures of the SDRs found,confirmed; identified or verified with the method of this invention, asis described e.g. by Liao et al., Structure, Vol. 9 (2001) 19-27.

Since SDR enzymes, in particular human SDR enzymes have been found to beinvolved in many pathways of the body, they are outstanding targets fordeveloping new drugs. In particular human SDR enzymes have been found tobe involved in intermediary metabolism, lipid mediator/hormonemetabolism or xenobiotic phase I metabolism. On the other hand, SDRenzymes often constitute pathogenic factors causing diseases. Thus, e.g.the AME syndrome is associated with 11β HSD-2, bile acid metabolism isassociated with 3β HSD, polycystic kidney disease is associated withKe6(17β HSD-8) and Alzheimer's disease is associated with ERAB(17-βHSD-10).

Further diseases which can be effected by influencing, modulating orinhibiting SDRs comprise e.g. DHPR deficiency, phenylketonuria, dienoylCoA reductase deficiency, galactosemia III, tetrahydrobiopterinedeficiency, adrenal hyperplasia, adrenogenital syndrome, 11-oxoreductasedeficiency, apparent mineralocorticoid excess syndrome, ovarian/breastcancer, male pyseudohermophroditism, Zellweger syndrome,pregnancy/ovarian cancer, polycystic kidney disease, Alzheimer'sdisease, retinits punctata albescens, retinitis pigmentosa, Down'ssyndrome, arterial hypertension, oncogenes, follicuolar lymphoma,hepatocarcinogenesis, aging related hormone deficiencies and immunity ingeneral.

Since many of the SDR enzyme are bidirectional (reversibleoxidoreaction) depending on the environment, it is also possible toprovide a means for selectively enhance one of the enzymatic reaction,i.e. oxidation or reduction or to reverse the action observed.

Thus, providing new SDR sequences and modulators therefor, as describedabove, allows for the preparation of drugs or pharmaceutical agents,which can be used to control many different diseases. In particulardrugs for treatment of cancer, e.g. breast cancer or prostate cancer,obesity, diabetes, fertility, osteoporosis, glucose metabolism, orconditions related to aging can be prepared. Further applicationsinclude steroid resistance, in particular ostrogen resistance andglucocorticoid resistance.

Further, SDR proteins and in particular hydroxy steroid dehydrogenases(HSDs) are outstanding targets for tissue-specific modulation ofhormone-dependent or sensitive diseases, e.g. cancer, in particularprostate or breast cancer.

The present invention is in particular useful for providing apharmaceutical agent for affecting immune regulation is provided bydeveloping a modulator for 17β HSD type 3, 17β HSD type 7, 17β HSD type8, 17β HSD type 10, 11β HSD-1, CR1, UDP glucose epimerase, SDR_SRL,AF067174, AF151840, AF151844, AF0078850, Fvt-1, HEP-27, DKFZ_ORF,WWOX_ORF, or CR3, a pharmaceutical agent for affecting autoimmunity isprovided by developing a modulator for 17β HSD-3, 17β HSD-8, 11β HSD-1,AF057034, U89717, CR1, AF0078850, HEP-27, or CR-3, a pharmaceuticalagent for wound healing or partial recovery is provided by developing amodulator for 17β HSD-3, 171 HSD-8, 11β HSD-1, U89717, CR1, AF0078850,HEP-27, or CR-3, a pharmaceutical agent for treatment of leukemia isprovided by developing modulators for 17-β HSD-10 or Fvt-1 or apharmaceutical agent for apoptosis regulation is provided by developinga modulator for 17β HSD-10, U89717, SDR_SRL; or for providing apharmaceutical agent for affecting immune response by providing amodulator for AF016509, or providing a pharmaceutical agent for thetreatement of cancer by providing modulators for AF016509, or providinga pharmaceutial agent for affecting cell growth by providing a modulatorfor U89717, or providing a pharmaceutical agent for the treatment oflung carcinoma by providing a modulator for SDR_SRL, or providing apharmaceutical agent for the regulation of inflammation or vasculitis byproviding a modulator for DKFZ_ORF.

The SDR candidates detected according to the invention can be usedparticularly for the production of inhibitors, such as antibodies onprotein level or antisense on nucleic acid level. Moreover, it ispossible to provide diagnostica by using the SDR candidates detectedaccording to the invention, e.g. in order to show a malfunction.

An important aspect of the present invention in view of the developmentof new drugs for the diagnosis and/or treatment of a disease is that theinventive approach aims on a target family, i.e. SDRs and not on aspecific disease. This allows for the development of a number of drugs,which all influence the same target family. By this approach the amountof experiments, effort and money necessary to develop a new drug can besignificantly reduced, since many results can be used parallel forfurther members of the same target family leading to further new drugsfor different medical applications. Further this approach allows foraffecting a target which is known or suspected to be highly relevant fora person's health. In contrast to the classical approach whereinstarting from a disease a suitable target must identified, this time andeffort consuming procedure is not necessary with the inventive approach.

The invention is further elucidated by the following figures wherein

FIG. 1 represents the search engine for SDR candiates; The targetsequence is compared to the specified core SDR motif, preferably inorder from the N-terminus to the C-terminus.

FIG. 2 shows flow charts for the preferred implementation of thealgorithm. FIG. 2 a shows a flow chart for data processing, while FIG. 2b shows a flow chart for the algorithm.

FIG. 3 depicts the development of pharmaceuticals on the basis of theSDR search according to the invention; The combination of virtualscreening and classifying sequences to belong to the SDR family with thedevelopment of new drugs, as provided herein, is an efficient novel drugdevelopment strategy. By using the search results of the virtual SDRsearch new targets are obtained, from which drugs can be derived byvarious procedures.

FIG. 4 shows an alignment of human SDRs. 39 human SDR proteins werefound in a database using the algorithm according to the invention.Throughout the various SDR proteins highly conserved amino acids areunderlaid in grey. As can be seen from this figure the motifs selectedfor the algorithm of the invention are present in most of the humanSDRs.

Tab. 1 Table 1 shows human and/or vertebrate SDRs detection with thealgorithm according to the invention. The detected SDRs are also subjectmatter of this invention. Further, Table 1 includes an EST search foreach SDR detected, with which the corresponding function andlocalization in tissue can be found or localized.

Tab.2 Table 2 shows mouse SDRs detected with the method according to theinvention and the results of EST searches by using these mouse SDRs inhuman tissue. Thus using SDRs of various species, e.g. mammals, allowsfor localization and identification of new SDRs, in particular humanSDRs on a genomic level. A preselection and/or identification of the SDRemployed can be performed with the method according to the invention.

Tab. 3 Table 3 shows in bacterial SDRs which were detected with themethod according to the invention. Such bacterial SDRs are particularlysuitable for the development of novel antibiotics.

Tab. 4 Table 4 shows FabG_ proteins, i.e. an SDR subgroup. It ispossible with the method according to the invention specificallyidentify desired subgroups by selection of further criteria in a secondsearch step.

Tab. 5 Table 5 shows SDRs from different fungi.

1. A method for identifying or verifying members of the short chaindehydrogenase (SDR) family comprising the steps (a) providing a targetsequence of molecules to be classified, (b) comparing said targetsequence with core SDR motifs selected from (i) MV1 being derived fromthe motif MT1:TGxxxGxG by replacement of 0 to 2 amino acids, (ii)MT2:NN(0-2:x)AG, (iii) MT3:N, located at a position 90-110 relative toMT1, (iv) MV4 being derived from the motif MT4:S(11-52:x)YxxxK byreplacement of 0-2 amino acids and (v) MT5:PG, (c) determining positiveSDR candidates containing (i) at least the core SDR motifs MV1 and MV4and (ii) at least 7 of the 14 amino acids contained in the motifs MT1,MT2, MT3, MT4 and MT5 and (d) classifying positive SDR candidates asbelonging to the SDR family.
 2. The method according to claim 1, furthercomprising a step (e) ranking of the positive SDR candidates obtainedaccording to the number of amino acids matching with motifs MT1, MT2,MT3, MT4 and MT5.
 3. The method according to claim 1, wherein in step(b) the target sequence is compared with core SDR motifs selected from(i) MT1:TGxxxGxG, (ii) MT2:NN(0-2:x)AG, (iii) MT3:N, located at position90-110 relative to MT1, (iv) MT4:S(11-52:x)YxxxK and (v) MT5:PG, andwherein in step (c) positive SDR candidates are determined containing(i) at least the core SDR motifs MT1 and MT4 and (ii) at least 7 of the14 amino acids contained in the motifs MT1, MT2, MT3, MT4 and MT5. 4.The method according to claim 1, wherein in step (c) positive SDRcandidates are determined containing (i) at least the core SDR motifsMV1, MV4 and one of MT2, MT3 and MT5 and (ii) at least 7 of the 14 aminoacids contained in the motifs MT1, MT2, MT3, MT4 and MT5.
 5. The methodaccording to claim 1, wherein in step (c) positive SDR candidates aredetermined containing (i) the core SDR motifs MV1, MV4, MT2 and MT3 orMV1, MV4, MT2 and MT5 or MV1, MV4, MT2, MT3 and MT5.
 6. The methodaccording to claim 1, wherein positive SDR candidates are determinedcontaining the core SDR motifs MV1, MV4, MT2, MT3 and MT5.
 7. The methodaccording to claim 1, wherein in step (c) positive candidates aredetermined containing at least 9 of the 14 amino acids contained in themotifs MT1, MT2, MT3, MT4, and MT5.
 8. The method according to claim 1,wherein MT2 is defined as NNAG.
 9. The method according to claim 1,wherein MV4 is derived from the motif MT′4:S(11-52:x)YxASK byreplacement of 0-2 amino acids.
 10. The method according to claim 9,wherein in step (c) positive candidates are determined containing atleast 9 of the 16 amino acids contained in the core motifs used.
 11. Themethod according to claim 1, wherein MT2 and/or MT5 are extended foridentifying or verifying FabG_SDRs, wherein MT_(y)2:VxVNNAG, wherein Vcan be replaced and MT_(y)5:PGFI, wherein F and/or I are used as searchmotif.
 12. The method according to claim 1, further comprising one ormore of the following further steps: (i) three-dimensional structurecomparison and (ii) biological function analysis.
 13. A member of theshort-chain dehydrogenase (SDR) family identified with the methodaccording to claim
 1. 14. The SDR according to claim 13, wherein it isselected from the SDRs shown in Tables 1-5.
 15. A method for providingmodulators for members of the short chain dehydrogenase (SDR) familycomprising the steps (a) providing one or more target sequences ofmembers of the short chain dehydrogenase family based on an algorithmusing core SDR motifs for searching members of the SDR family and (b)providing modulators, which enhance or inhibit the activity of themembers of the short chain dehydrogenase family.
 16. The methodaccording to claim 15, wherein step (a) comprises the steps (a)providing a target sequence of molecules to be classified, (b) comparingsaid target sequence with core SDR motifs selected from (i) MV1 beingderived from the motif MT1:TGxxxGxG by replacement of 0 to 2 aminoacids, (ii) MT2:NN(0-2:x)AG, (iii) MT3:N, located at a position 90-110relative to MT1, (iv) Mv4 being derived from the motifMT4:S(11-52:x)YxxxK by replacement of 0-2 amino acids and (v) MT5:PG,(c) determining positive SDR candidates containing (i) at least the coreSDR motifs MV1 and MV4 and (ii) at least 7 of the 14 amino acidscontained in the motifs MT1, MT2, MT3, MT4 and MT5 and (d) classifyingpositive SDR candidates as belonging to the SDR family.
 17. The methodaccording to claim 15, wherein in step (b) a protein sequence alignmentwith known SDR sequences is performed for pre-selecting possiblemodulators.
 18. A method for evaluation of lead-candidates for possiblemodulators of a member of the SDR family comprising the steps (a)providing one or more target sequences of members of the short chaindehydrogenase family based on an algorithm using core SDR motifs forsearching members of the SDR family, (b) ranking the target sequencesaccording to the number of amino acids matching with the core SDR motifsused and (c) deriving lead-candidates from metabolites of evolutionaryrelated SDR enzymes.
 19. The method according to claim 18, wherein step(a) comprises the steps (a) providing a target sequence of molecules tobe classified, (b) comparing said target sequence with core SDR motifsselected from (i) MV1 being derived from the motif MT1:TGxxxGxG byreplacement of 0 to 2 amino acids, (ii) MT2:NN(0-2:x)AG, (iii) MT3:N,located at a position 90-110 relative to MT1, (iv) MV4 being derivedfrom the motif MT4:S(11-52:x)YxxxK by replacement of 0-2 amino acids and(v) MT5:PG, (c) determining positive SDR candidates containing (i) atleast the core SDR motifs MV 1 and MV4 and (ii) at least 7 of the 14amino acids contained in the motifs MT1, MT2, MT3, MT4 and MT5 and (d)classifying positive SDR candidates as belonging to the SDR family. 20.A method for providing a pharmaceutical agent comprising the steps (a)providing tone or more target sequences of members of the short chaindehydrogenase family based on an algorithm using core SDR motifs forsearching members of the SDR family, (b) providing modulators, whichenhance or inhibit the activity of the members of the short chaindehydrogenase family and (c) formulating said modulators aspharmaceutical agent.
 21. The method according to claim 20, wherein step(a) comprises the steps (a) providing a target sequence of molecules tobe classified, (b) comparing said target sequence with core SDR motifsselected from (i) MV1 being derived from the motif MT1:TGxxxGxG byreplacement of 0 to 2 amino acids, (ii) MT2:NN(0-2:x)AG, (iii) MT3:N,located at a position 90-110 relative to MT1, (iv) MV4 being derivedfrom the motif MT4:S(11-52:x)YxxxK by replacement of 0-2 amino acids and(v) MT5:PG, (c) determining positive SDR candidates containing (i) atleast the core SDR motifs MV1 and MV4 and (ii) at least 7 of the 14amino acids contained in the motifs MT1, MT2, MT3, MT4 and MT5 and (d)classifying positive SDR candidates as belonging to the SDR family. 22.The method according to claim 20, wherein step (b) comprises the steps(a) providing one or more target sequences of members of the short chaindehydrogenase family based on an algorithm using core SDR motifs forsearching members of the SDR family and (b) providing modulators, whichenhance or inhibit the activity of the members of the short chaindehydrogenase family.
 23. The method according to claim 20, wherein amodulator is provided, which enhances the activity of the members of theshort chain dehydrogenase family.
 24. The method according to claim 20,wherein a modulator is provided, which inhibits the activity of themembers of the short chain dehydrogenase family.
 25. The methodaccording to claim 20, wherein the validation of a modulator or afunction of a SDR enzyme found with an algorithm using core SDR motifsis performed with biochemical methods.
 26. The method according to claim20, wherein expressed sequence tags and gene sequence comparison areused to provide a function of the member of the short chaindehydrogenase family, which has been identified or verified with analgorithm using core SDR motifs.
 27. The method according to claim 20,wherein a modulator or a function of an SDR enzyme found with analgorithm using core SDR motifs is validated high throughput functionscreening for function identification, UHTS for lead compounds,molecular homology modelling, substrate docking simulations, tissueexpression, cDNA arrays or analysis of disease in animal or in vitromodel systems.
 28. The method according to claim 20, wherein a human SDRenzyme is provided and the pharmaceutical agent is applied fortherapeutic or diagnostic purposes.
 29. The method according to claim28, wherein the human SDR enzyme is selected from the human SDRs shownin Table 1 or
 2. 30. The method according to claim 20, wherein an SDRfrom a pathogen and/or a fungi is provided to obtain a high specificpharmaceutical agent.
 31. The method according to claim 30, wherein theSDR is selected from the SDRs shown in Table 3, 4 or
 5. 32. The methodaccording to claim 20, wherein an SDR enzyme with high homology isprovided, which constitutes an essential enzyme.
 33. The methodaccording to claim 20, wherein an SDR enzyme with low homology or highdivergence between different species is provided, which allows for aspecies specific modulation.
 34. A pharmaceutical agent obtainable by amethod according to claim
 20. 35. The pharmaceutical agent according toclaim 34 for the prophylaxis, treatment and/or diagnosis of diseases.36. The pharmaceutical agent according to claim 34, which is a fungicideor antibiotic.
 37. A method for detection of clinically relevantpolymorphisms or single nucleotide polymorphisms comprising the steps(a) providing one or more target sequences or members of the short chaindehydrogenase family based on an algorithm using core SDR motifs forsearching members of the SDR family, (b) ranking the members of theshort chain dehydrogenase family according to the number of amino acidsmatching with the core SDR motifs applied, and (c) comparingevolutionary patterns within the SDR enzymes.
 38. The method accordingto claim 37, wherein disease mechanisms are characterised;
 39. Themethod according to claim 37, wherein metabolisms of xenobiotics arecharacterised.
 40. The method according to claim 37, whereinstructure-function relationships are identified and/or substrates of SDRmembers with unknown function are identified.
 41. The method accordingto claim 20, wherein a pharmaceutical agent for affecting immuneregulation is provided by developing a modulator for 17β HSD type 3, 17βHSD type 7, 17β HSD type 8, 17β HSD type 10, 11β HSD-1, CR1, UDP glucoseepimerase, SDR_SRL, AF067174, AF151840, AF151844, AF0078850, Fvt-1,HEP-27, DKFZ_ORF, WWOX_ORF, or CR3.
 42. The method according to claim20, wherein a pharmaceutical agent for affecting autoimmunity isprovided by developing a modulator for 17β HSD-3, 17β HSD-8, 11β HSD-1,AF057034, U89717, CR1, AF0078850, HEP-27, or CR-3.
 43. The methodaccording to claim 20, wherein a pharmaceutical agent for wound healingor partial recovery is provided by developing a modulator for 17β HSD-3,17β HSD-8, 11β HSD-1, U89717, CR1, AF0078850, HEP-27, or CR-3.
 44. Themethod according to claim 20, wherein a pharmaceutical agent fortreatment of leukemia is provided by developing modulators for 17-βHSD-10 or Fvt-1.
 45. The method according to claim 20, wherein apharmaceutical agent for apoptosis regulation is provided by developinga modulator for 17β HSD-10, U89717, SDR_SRL; or for providing apharmaceutical agent for affecting immune response by providing amodulator for AF016509, or providing a pharmaceutical agent for thetreatment of cancer by providing modulators for AF016509, or providing apharmaceutical agent for affecting cell growth by providing a modulatorfor U89717, or providing a pharmaceutical agent for the treatment oflung carcinoma by providing a modulator for SDR_SRL, or providing apharmaceutical agent for the regulation of inflammation or vasculitis byproviding a modulator for DKFZ_ORF.